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Abstract. This paper addresses the problem of fair equilibrium selec- 
tion in graphical games. Our approach is based on the data structure 
called the best response policy, which was proposed by Kearns et al. [12] 
as a way to represent all Nash equilibria of a graphical game. In [9], it 
was shown that the best response policy has polynomial size as long as 
the underlying graph is a path. In this paper, we show that if the under- 
lying graph is a bounded-degree tree and the best response policy has 
polynomial size then there is an efficient algorithm which constructs a 
Nash equilibrium that guarantees certain payoffs to all participants. An- 
other attractive solution concept is a Nash equilibrium that maximizes 
the social welfare. We show that, while exactly computing the latter 
is infeasible (we prove that solving this problem may involve algebraic 
numbers of an arbitrarily high degree) , there exists an FPTAS for finding 
such an equilibrium as long as the best response policy has polynomial 
size. These two algorithms can be combined to produce Nash equilibria 
that satisfy various fairness criteria. 



1 Introduction 

In a large community of agents, an agent's behavior is not likely to have a direct 
effect on most other agents: rather, it is just the agents who are close enough to 
him that will be affected. However, as these agents respond by adapting their 
behavior, more agents will feel the consequences and eventually the choices made 
by a single agent will propagate throughout the entire community. 

This is the intuition behind graphical games, which were introduced by Kearns, 
Liftman and Singh in [12] as a compact representation scheme for games with 
many players. In an n-player graphical game, each player is associated with a 
vertex of an underlying graph G, and the payoffs of each player depend on his 
action as well as on the actions of his neighbors in the graph. If the maximum 
degree of G is A, and each player has two actions available to him, then the 
game can be represented using n2 +1 numbers. In contrast, we need n2 n num- 
bers to represent a general n-player 2-action game, which is only practical for 
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small values of n. For graphical games with constant A, the size of the game is 
linear in n. 

One of the most natural problems for a graphical game is that of finding a 
Nash equilibrium, the existence of which follows from Nash's celebrated theorem 
(as graphical games are just a special case of n-player games). The first attempt 
to tackle this problem was made in [12], where the authors consider graphical 
games with two actions per player in which the underlying graph is a bounded- 
degree tree. They propose a generic algorithm for finding Nash equilibria that 
can be specialized in two ways: an exponential-time algorithm for finding an 
(exact) Nash equilibrium, and a w fully polynomial time approximation scheme 
(FPTAS) for finding an approximation to a Nash equilibrium. For any e > this 
algorithm outputs an e-Nash equilibrium, which is a strategy profile in which 
no player can improve his payoff by more than e by unilaterally changing his 
strategy. 

While e-Nash equilibria are often easier to compute than exact Nash equi- 
libria, this solution concept has several drawbacks. First, the players may be 
sensitive to a small loss in payoffs, so the strategy profile that is an e-Nash equi- 
librium will not be stable. This will be the case even if there is only a small subset 
of players who are extremely price-sensitive, and for a large population of play- 
ers it may be difficult to choose a value of e that will satisfy everyone. Second, 
the strategy profiles that are close to being Nash equilibria may be much better 
with respect to the properties under consideration than exact Nash equilibria. 
Therefore, the (approximation to the) value of the best solution that corresponds 
to an e-Nash equilibrium may not be indicative of what can be achieved under 
an exact Nash equilibrium. This is especially important if the purpose of the 
approximate solution is to provide a good benchmark for a system of selfish 
agents, as the benchmark implied by an e-Nash equilibrium may be unrealistic. 
For these reasons, in this paper we focus on the problem of computing exact 
Nash equilibria. 

Building on ideas of [13], Elkind et al. [9] showed how to find an (exact) Nash 
equilibrium in polynomial time when the underlying graph has degree 2 (that is, 
when the graph is a collection of paths and cycles). By contrast, finding a Nash 
equilibrium in a general degree-bounded graph appears to be computationally 
intractable: it has been shown (see [5,11,7]) to be complete for the complexity 
class PPAD. [9] extends this hardness result to the case in which the underlying 
graph has bounded pathwidth. 

A graphical game may not have a unique Nash equilibrium, indeed it may 
have exponentially many. Moreover, some Nash equilibria are more desirable 
than others. Rather than having an algorithm which merely finds some Nash 
equilibrium, we would like to have algorithms for finding Nash equilibria with 
various socially-desirable properties, such as maximizing overall payoff or dis- 
tributing profit fairly. 

A useful property of the data structure of [12] is that it simultaneously repre- 
sents the set of all Nash equilibria of the underlying game. If this representation 
has polynomial size (as is the case for paths, as shown in [9]), one may hope to 



2 



extract from it a Nash equilibrium with the desired properties. In fact, in [12] 
the authors mention that this is indeed possible if one is interested in finding 
an (approximate) e-Nash equilibrium. The goal of this paper is to extend this to 
exact Nash equilibria. 

1.1 Our Results 

In this paper, we study n-player 2-action graphical games on bounded-degree 
trees for which the data structure of [12] has size poly(n). We focus on the prob- 
lem of finding exact Nash equilibria with certain socially-desirable properties. In 
particular, we show how to find a Nash equilibrium that (nearly) maximizes the 
social welfare, i.e., the sum of the players' payoffs, and we show how to find a 
Nash equilibrium that (nearly) satisfies prescribed payoff bounds for all players. 

Graphical games on bounded-degree trees have a simple algebraic structure. 
One attractive feature, which follows from [12], is that every such game has a 
Nash equilibrium in which the strategy of every player is a rational number. 
Section 3 studies the algebraic structure of those Nash equilibria that maximize 
social welfare. We show (Theorems 1 and 2) that, surprisingly, the set of Nash 
equilibria that maximize social welfare is more complex. In fact, for any algebraic 
number a € [0, 1] with degree at most n, we exhibit a graphical game on a path of 
length O(n) such that, in the unique social welfare-maximizing Nash equilibrium 
of this game, one of the players plays the mixed strategy a. 1 This result shows 
that it may be difficult to represent an optimal Nash equilibrium. It seems to be 
a novel feature of the setting we consider here, that an optimal Nash equilibrium 
is hard to represent, in a situation where it is easy to find and represent a Nash 
equilibrium. 

As the social welfare-maximizing Nash equilibrium may be hard to represent 
efficiently, we have to settle for an approximation. However, the crucial difference 
between our approach and that of previous papers [12, 15, 18] is that we require 
our algorithm to output an exact Nash equilibrium, though not necessarily the 
optimal one with respect to our criteria. In Section 4, we describe an algorithm 
that satisfies this requirement. Namely, we propose an algorithm that for any 
e > finds a Nash equilibrium whose total payoff is within e of optimal. It 
runs in polynomial time (Theorem 3,4) for any graphical game on a bounded- 
degree tree for which the data structure proposed by [12] (the so-called best 
response policy, defined below) is of size poly(n) (note that, as shown in [9], 
this is always the case when the underlying graph is a path). More precisely, 
the running time of our algorithm is polynomial in n, P max , and 1/e, where 
-Pmax is the maximum absolute value of an entry of a payoff matrix, i.e., it is 
a pseudopolynomial algorithm, though it is fully polynomial with respect to 
e. We show (Section 4.1) that under some restrictions on the payoff matrices, 
the algorithm can be transformed into a (truly) polynomial-time algorithm that 

1 A related result in a different context was obtained by Datta [8], who shows that 
n-player 2-action games are universal in the sense that any real algebraic variety can 
be represented as the set of totally mixed Nash equilibria of such games. 
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outputs a Nash equilibrium whose total payoff is within a 1 — e factor from the 
optimal. 

In Section 5, we consider the problem of finding a Nash equilibrium in which 
the expected payoff of each player Vi exceeds a prescribed threshold 7*. Using the 
idea from Section 4 we give (Theorem 5) a fully polynomial time approximation 
scheme for this problem. The running time of the algorithm is bounded by a 
polynomial in n, P max , and e. If the instance has a Nash equilibrium satisfying 
the prescribed thresholds then the algorithm constructs a Nash equilibrium in 
which the expected payoff of each player Vi is at least Tj — e. 

In Section 6, we introduce other natural criteria for selecting a "good" Nash 
equilibrium and we show that the algorithms described in the two previous sec- 
tions can be used as building blocks in finding Nash equilibria that satisfy these 
criteria. In particular, in Section 6.1 we show how to find a Nash equilibrium 
that approximates the maximum social welfare, while guaranteeing that each 
individual payoff is close to a prescribed threshold. In Section 6.2 we show how 
to find a Nash equilibrium that (nearly) maximizes the minimum individual pay- 
off. Finally, in Section 6.3 we show how to find a Nash equilibrium in which the 
individual payoffs of the players are close to each other. 

1.2 Related Work 

Our approximation scheme (Theorem 3 and Theorem 4) shows a contrast be- 
tween the games that we study and two-player n-action games, for which the 
corresponding problems are usually intractable. For two-player n-action games, 
the problem of finding Nash equilibria with special properties is typically NP- 
hard. In particular, this is the case for Nash equilibria that maximize the social 
welfare [10,6]. Moreover, it is likely to be intractable even to approximate such 
equilibria. In particular, Chen, Deng and Teng [4] show that there exists some e, 
inverse polynomial in n, for which computing an e-Nash equilibrium in 2-player 
games with n actions per player is PPAD-complete. 

Lipton and Markakis [14] study the algebraic properties of Nash equilibria, 
and point out that standard quantifier elimination algorithms can be used to 
solve them. Note that these algorithms are not polynomial-time in general. The 
games we study in this paper have polynomial-time computable Nash equilib- 
ria in which all mixed strategies are rational numbers, but an optimal Nash 
equilibrium may necessarily include mixed strategies with high algebraic degree. 

A correlated equilibrium (CE) (introduced by Aumann [2]) is a distribution 
over vectors of players' actions with the property that if any player is told his 
own action (the value of his own component) from a vector generated by that 
distribution, then he cannot increase his expected payoff by changing his action. 
Any Nash equilibrium is a CE but the converse does not hold in general. In 
contrast with Nash equilibria, correlated equilibria can be found for low-degree 
graphical games (as well as other classes of concisely-represented multiplayer 
games) in polynomial time [16]. But, for graphical games it is NP-hard to find 
a correlated equilibrium that maximizes total payoff [17]. However, the NP- 
hardness results apply to more general games than the one we consider here, in 
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particular the graphs are not trees. From [2] it is also known that there exist 2- 
player, 2-action games for which the expected total payoff of the best correlated 
equilibrium is higher than the best Nash equilibrium, and we discuss this issue 
further in Section 7. 

2 Preliminaries and Notation 

We consider graphical games in which the underlying graph G is an n-vertex 
tree, in which each vertex has at most A children. Each vertex has two actions, 
which are denoted by and 1. A mixed strategy of a player V is represented as 
a single number v £ [0, 1], which denotes the probability that V selects action 1. 

For the purposes of the algorithm, the tree is rooted arbitrarily. For conve- 
nience, we assume without loss of generality that the root has a single child, 
and that its payoff is independent of the action chosen by the child. This can 
be achieved by first choosing an arbitrary root of the tree, and then adding a 
dummy "parent" of this root, giving the new parent a constant payoff function, 
e.g., 0. 

Given an edge (V, W) of the tree G, and a mixed strategy w for W, let 
G(y.yv)-W=w be the instance obtained from G by (1) deleting all nodes Z which 
are separated from V by W (i.e., all nodes Z such that the path from Z to V 
passes through W), and (2) restricting the instance so that W is required to 
play mixed strategy w. 

Definition 1. Suppose that (V,W) is an edge of the tree, that v is a mixed 
strategy for V and that w is a mixed strategy for W . We say that v is a poten- 
tial best response to w ( denoted by v £ pbr y (w) ) if there is an equilibrium in 
the instance G(y,w),w=w ^ n which V has mixed strategy v. We define the best 
response policy for V, given W, as B(W, V) = {(w, v) \ v E pbr v (w), w G [0, 1]}. 

The upstream pass of the generic algorithm of [12] considers every node V 
(other than the root) and computes the best response policy for V given its 
parent. With the above assumptions about the root, the downstream pass is 
straightforward. The root selects a mixed strategy w for the root W and a 
mixed strategy v € BiW, V) for each child V of W. It instructs each child V 
to play v. The remainder of the downward pass is recursive. When a node V is 
instructed by its parent to adopt mixed strategy v, it does the following for each 
child U — It finds a pair (v,u) G B(V, U) (with the same v value that it was 
given by its parent) and instructs U to play u. 

The best response policy for a vertex U given its parent V can be represented 
as a union of rectangles, where a rectangle is defined by a pair of closed intervals 
(Iv,Iu) and consists of all points in ly x Ijj; it may be the case that one or 
both of the intervals ly and I\j consists of a single point. In order to perform 
computations on B(V, U), and to bound the number of rectangles, [9] used the 
notion of an event point, which is defined as follows. For any set A C [0, l] 2 
that is represented as a union of a finite number of rectangles, we say that a 
point u e [0, 1] on the [/-axis is a U- event point of A if u = or u = 1 or the 
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representation of A contains a rectangle of the form Iy x l\j and u is an endpoint 
of Ijj ; F-event points are defined similarly. 

For many games considered in this paper, the underlying graph is an n-vertex 
path, i.e., a graph G = (V, E) with the vertex set V = {V\, . . . ,V n } and the edge 
set E = {(Vi, V2), . . • , (V n -i, V n )}. In [9], it was shown that for such games, the 
best response policy has only polynomially-many rectangles. The proof that the 
number of rectangles in B(Vj + i, Vj) is polynomial proceeds by first showing that 
the number of event points in B(Vj+i,Vj) cannot exceed the number of event 
points in B(Vj, Vj-\) by more than 2, and using this fact to bound the number 
of rectangles in B(Vj + \,Vj). 

Let P (V) and P 1 ^) be the expected payoffs to V when it plays and 1, 
respectively. Both P°(V) and P 1 ^) are multilinear functions of the strategies 
of V's neighbors. In what follows, we will frequently use the following simple 
observation. 

Claim. For a vertex V with a single child U and parent W, given any A, B,C,D e 
Q, A', B', C, D' e Q, one can select the payoffs to V so that P°(V) = Auw + 
Bu + Cw + D, P 1 (V) = A'uw + B'u + C'w + D' . Moreover, if all A, B, C, D, 
A\ B', C, D' are integer, the payoffs to V are integer as well. 

Proof. We will give the proof for P°(V); the proof for P 1 ^) is similar. For 
i, j = 0, 1, let Pij be the payoff to V when U plays i, V plays and W plays j. 
WehaveP°(y) = P a o(l-u)(l-w) + P w u(l-w) + P 01 (l-u)w + P 11 uw. We have 
to select the values of P^ so that Poo — P10 — P01 + Pli = A, —Poo + Plo = B, 
—Poo + P01 = C, Poo = D. It is easy to see that the unique solution is given by 
Poo = D, P01 = C + D, P10 = B + D, Pn = A + B + C + D. 

The input to all algorithms considered in this paper includes the payoff matri- 
ces for each player. We assume that all elements of these matrices are integer. Let 
Pnax be the greatest absolute value of any element of any payoff matrix. Then 
the input consists of at most n2 A+1 numbers, each of which can be represented 
using [logP max ] bits. 

3 Nash Equilibria That Maximize The Social Welfare: 
Solutions in R \ Q 

From the point of view of social welfare, the best Nash equilibrium is the one 
that maximizes the sum of the players' expected payoffs. Unfortunately, it turns 
out that computing such a strategy profile exactly is not possible: in this section, 
we show that even if all players' payoffs are integers, the strategy profile that 
maximizes the total payoff may have irrational coordinates; moreover, it may 
involve algebraic numbers of an arbitrary degree. 

3.1 Warm-up: quadratic irrationalities 

We start by providing an example of a graphical game on a path of length 3 
with integer payoffs such that in the Nash equilibrium that maximizes the total 
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payoff, one of the players has a strategy in R \ Q. In the next subsection, we 
will extend this example to algebraic numbers of arbitrary degree n; to do so, 
we have to consider paths of length 0(n). 

Theorem 1. There exists an integer-payoff graphical game G on a 3-vertex path 
UVW such that, in any Nash equilibrium of G that maximizes social welfare, 
the strategy, u, of the player U and the total payoff, p, satisfy ti,pel\Q. 

Proof. The payoffs to the players in G are specified as follows. The payoff to U 
is identically 0, i.e., P°(U) = P^iJJ) = 0. Using Claim 2, we select the payoffs to 
V so that P°{V) = -uw + 3w and P^V) = P°(V) + w(u + 2) - (u + 1), where 
u and w are the (mixed) strategies of U and W, respectively. It follows that V 
is indifferent between playing and 1 if and only if w = f(u) = Observe 
that for any u e [0, 1] we have f(u) G [0, 1]. The payoff to W is if it selects the 
same action as V and 1 otherwise. 

Claim. All Nash equilibria of the game G are of the form (u, 1/2, f(u)). That 
is, in any Nash equilibrium, V plays v = 1/2 and W plays w = f(u). Moreover, 
for any value of u, the vector of strategies (u, 1/2, f(u)) constitutes a Nash 
equilibrium. 

Proof. It is easy to check that for any u e [0, 1], the vector (u, 1/2, /(«)) is a 
Nash equilibrium. Indeed, U is content to play any mixed strategy u no matter 
what V and W do. Furthermore, V is indifferent between and 1 as long as 
w = f(u), so it can play 1/2. Finally, if V plays and 1 with equal probability, 
W is indifferent between and 1, so it can play f(u). 

Conversely, suppose that v > 1/2. Then W strictly prefers to play 0, i.e., 
w = 0. Then for V we have P 1 ^) = P°(V) - (u + 1), i.e., P^V) < P°(V), 
which implies v = 0, a contradiction. Similarly, if v < 1/2, player W prefers 
to play 1, so we have w = 1. Hence, P 1 ^) = P°(V) + (u + 2) - (u + 1), i.e., 
> P°(V), which implies v — 1, a contradiction. Finally, if v = 1/2, but 
w 7^ f(u), player V is not indifferent between and 1, so he would deviate from 
playing 1/2. This completes the proof of Claim 3.1. 

By Claim 3.1, the total payoff in any Nash equilibrium of this game is a 
function of u. More specifically, the payoff to U is 0, the payoff to V is —uf{u) + 
3f(u), and the payoff to W is 1/2. Therefore, the Nash equilibrium with the 
maximum total payoff corresponds to the value of u that maximizes 

= _„(«+!) + 3 «+l = _(«-3)(«+l) 
yy ' u + 2 u + 2 u + 2 

To find extrema of g(u), we compute h(u) — —-^g{u). We have 



h(u) 



(2u - 2){u + 2) - (u - 3)(w + 1) _ u 2 + 4u - 1 



du- 

2 



(u + 2) 2 (u + 2) 2 ' 

Hence, h(u) = if and only if u e {-2+V5, -2-VE}. Note that -2+V5 e [0,1]. 
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The function g(u) changes sign at —2, — 1, and 3. We have g{u) < for 
g > 3, g(u) > for u < —2, so the extremum of g(u) that lies between 1 and 
3, i.e., u = — 2 + \/5, is a local maximum. We conclude that the social welfare- 
maximizing Nash equilibrium for this game is given by the vector of strategies 
(—2 + V5, 1/2, (5 — \/5)/5). The respective total payoff is 

( ^-5)(y5-i) i 2yf 

This concludes the proof of Theorem 1 . 



3.2 Strategies of arbitrary degree 

We have shown that in the social welfare-maximizing Nash equilibrium, some 
players' strategics can be quadratic irrationalities, and so can the total payoff. 
In this subsection, we will extend this result to show that we can construct an 
integer-payoff graphical game on a path whose social welfare-maximizing Nash 
equilibrium involves arbitrary algebraic numbers in [0, 1]. 

Theorem 2. For any degree-n algebraic number a G [0, 1], there exists an in- 
teger payoff graphical game on a path of length 0(n) such that, in all social 
welfare-maximizing Nash equilibria of this game, one of the players plays a. 

Proof. Our proof consists of two steps. First, we construct a rational expression 
R(x) and a segment [x',x"} such that x',x" € Q and a is the only maximum of 
R(x) on [x',x"]. Second, we construct a graphical game whose Nash equilibria 
can be parameterized by u G [x' , x"], so that at the equilibrium that corresponds 
to u the total payoff is R(u) and, moreover, some player's strategy is u. It follows 
that to achieve the payoff-maximizing Nash equilibrium, this player has to play 
a. The details follow. 

Lemma 1. Given an algebraic number a e [0,1], deg(a) = n, there exist 
K 2l ■ ■ ■ , K 2n +2 € Q and x', x" e (0, 1) n Q such that a is the only maximum 
of 

K 2 K 2n +2 



R{x) = —±- + ...+ 



x + 2 x + 2n + 2 

on [x 1 , x"]. 

Proof. Let P{x) be the minimal polynomial of a, i.e., a polynomial of degree 
n with rational coefficients whose leading coefficient is 1 such that P(a) = 0. 
Let A — {ai, . . . , a n } be the set of all roots of P(x). Consider the polynomial 
Qi(x) = —P 2 (x). It has the same roots as P(x), and moreover, for any x £ A 
we have Qi(x) < 0. Hence, A is the set of all maxima of Qi(x). Now, set 
R(x) = , . — 1 , < ? 1 ^f^/ , nx - Observe that R(x) < for all x € [0,11 and 
R{x) — if and only if Q\{x) = 0. Hence, the set A is also the set of all maxima 
oiR(x) on [0,1]. 
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Let d = min{|ai — a\ \ on e A, en ^ a}, and set a 1 = max{a — d/2,0}, 
a" = min{a + (i/2, 1}. Clearly, a is the only zero (and hence, the only maximum) 
of R(x) on [a', a"]. Let x' and x" be some rational numbers in (a', a) and (a, a"), 
respectively; note that by excluding the endpoints of the intervals we ensure that 
x', x" ^ 0, 1. As [x 1 , x"\ C [a', a"], we have that a is the only maximum of R(x) 
on [a;', a;"]. 

As R(x) is a proper rational expression and all roots of its denominator are 
simple, by partial fraction decomposition theorem, R(x) can be represented as 

R(x) = ^- + ---+ K2n+2 



x + 2 x + 2n + 2' 

where K2, . ■ . , i^2n+2 are rational numbers. 
Consider a graphical game on the path 

U-xV-xU V UxVx . . . Uk^Vk^Uk, 

where k = 2n + 2. Intuitively, we want each triple {Ui-i,Vi-\,Ui) to behave 
similarly to the players U, V, and W from the game described in the previous 
subsection. More precisely, we define the payoffs to the players in the following 
way. 

— The payoff to U-i is no matter what everyone else does. 

— The expected payoff to V-i is if it plays and u — (x" — x')ii-i — x' if it 
plays 1, where uo and U-i are the strategies of Ua and U-\, respectively. 

— The expected payoff to Vo is if it plays and ui(uq + 1) — uq if it plays 1, 
where uq and u± are the strategies of Uq and Ui, respectively. 

— For each i — 1, . . . , k — 1, the expected payoff to Vi when it plays is 
P°(Vi) = AiUiU i+ i — AiUi+i, and the expected payoff to Vi when it plays 1 
is -P 1 ^) = P°{Vi) + u i+ i(2 — — 1, where Ai = —K i+ i and u i+ i and Ui 
are the strategies of Ui + \ and Ui, respectively. 

— For each i = 0, . . . , k, the payoff to Ui does not depend on Vi and is 1 if U 
and V-i select different actions and otherwise. 

We will now characterize the Nash equilibria of this game using a sequence of 
claims. 

Claim. In all Nash equilibria of this game V_i plays 1/2, and the strategies of 
U-i and u satisfy u = (x" — x')u-\ + x' . Consequently, in all Nash equilibria 
we have uo € \x' ,x"\. 

Proof. The proof is similar to that of Claim 3.1. Let f(u-\) = [x" —x')u-i +x' . 
Clearly, the player V-i is indifferent between playing and 1 if and only if 
uq = f(u-i). Suppose that V-i < 1/2. Then Uo strictly prefers to play 1, i.e., 
uq = 1, so we have 

PHV-i) = P°(V-i) + 1 - (x" - a/)u-i - x'. 
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As 

1 - x" < 1 - (x" - X')U-! - X' < 1 - X' 

for m_i e [0, 1] and x" < 1, we have P-^V-i) > P°(V_i), so VI i prefers to play 
1, a contradiction. Similarly, if V-\ > 1/2, the player Uq strictly prefers to play 
0, i.e., wo = 0, so we have 

P^VLi) = P°(Vli) - (x" - x')u-i - x'. 

As x' < x", x' > 0, we have P^V-i) < P°(V_i), so V_i prefers to play 0, 
a contradiction. Finally, if plays 1/2, but uo ^ player V_i is not 

indifferent between and 1, so he would deviate from playing 1/2. 

Also, note that /(0) = x', /(l) = x", and, moreover, € [x',x"] if 

and only if m_i e [0,1]. Hence, in all Nash equilibria of this game we have 
Mo € [x', x"]. 

Claim. In all Nash equilibria of this game for each i = 0, . . . , k — 1, we have 
Uj = 1/2, and the strategies of the players {/, and t/j+i satisfy Uj+i = fi(ui), 
where /o(w) = u/(u + 1) and = 1/(2 — u) for i > 0. 

Proof. The proof of this claim is also similar to that of Claim 3.1. We use in- 
duction on i to prove that the statement of the claim is true and, additionally, 
Ui 7^ 1 for i > 0. 

For the base case i = 0, note that Mo ^ by the previous claim (recall that 
x , x are selected so that x',x" ^ 0,1) and consider the triple (Uq,Vq,Ui). 
Let v be the strategy of V . First, suppose that u > 1/2. Then U\ strictly 
prefers to play 0, i.e., U\ = 0. Then for V we have P 1 (Vo) = P°(V ) — u a . 
As u 7^ 0, we have P 1 (Vb) < P°(Vo), which implies v\ = 0, a contradiction. 
Similarly, if v < 1/2, player Ui prefers to play 1, so we have u\ = 1. Hence, 
P X (V ) = P°(Vo) + 1- It follows that P l {V ) > P°(V ), which implies v a = 1, 
a contradiction. Finally, if vo — 1/2, but u\ ^ uq/(uo + 1), player Vb is not 
indifferent between and 1, so he would deviate from playing 1/2. Moreover, as 
mi = uq/(uq + 1) and uq € [0, 1], we have ui ^ 1. 

The argument for the inductive step is similar. Namely, suppose that the 
statement is proved for all i' < i and consider the triple (Ui, Vi, f/j+i). 

Let Vi be the strategy of V^. First, suppose that Vi > 1/2. Then Ui + \ strictly 
prefers to play 0, i.e., u i+ i = 0. Then for Vi we have P 1 (l / i) = P°(Vi) — 1, i.e., 
P 1 (l / i) < P°(Vi), which implies Vi — 0, a contradiction. Similarly, if Vi < 1/2, 
player Ui+i prefers to play 1, so we have Ui+i = 1. Hence, P 1 (l / i) = P°(Vi) + 
1 — Ui. By inductive hypothesis, we have < 1. Consequently, P 1 (Vi) > P°(Vi), 
which implies Vi = 1, a contradiction. Finally, if = 1/2, but 7^ 1/(2 — Uj), 
player Vi is not indifferent between and 1, so he would deviate from playing 
1/2. Moreover, as it,+i = 1/(2 — m) and Ui < 1, we have Ui+\ < 1. 

Claim. Any strategy profile of the form 

(u_i, 1/2, mo, 1/2, mi, 1/2, . . . , u k -u 1/2, u fc ), 

where m_i e [0, 1], m = (x" — x')m_i+x', mi = m /(m + 1), and M i+i = 1/(2 — ui) 
for i > 1 constitutes a Nash equilibrium. 
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Proof. First, the player U-i's payoffs do not depend on other players' actions, 
so he is free to play any strategy in [0, 1]. As long as u n = (x" — x')u-\ + x', 
player V-\ is indifferent between and 1, so he is content to play 1/2; a similar 
argument applies to players Vb, . . . , Vk-\- Finally, for each i = 0,...,k, the 
payoffs of player U only depend on the strategy of player Vi-\. In particular, 
as long as = 1/2, player Ui is indifferent between playing and 1, so 

he can play any mixed strategy Wj G [0, 1]. To complete the proof, note that 
{x" -x')u_i+x' e [0,1] for allu_i € [0,1], u /(u Q + l) G [0,1] for all u G [0,1], 
and 1/(2 — m) G [0, 1] for all m G [0, 1], so we have m G [0, 1] for alH = 0, . . . , k. 

Now, let us compute the total payoff under a strategy profile of the form 
given in Claim 3.2. The payoff to U-\ is 0, and the expected payoff to each of 
the Ui, i — 0, . . . , k, is 1/2. The expected payoffs to V-i and Vb are 0. Finally, 
for any i = 1, . . . , k — 1, the expected payoff to Vi is T, = AiUiU i+ i — AiU i+ i. 
It follows that to find a Nash equilibrium with the highest total payoff, we have 
to maximize ^2 i= i subject to conditions U-\ G [0, 1], «o = (x" — x')u-\ + x', 
mi = uo/(uo + 1), and u i+ i =1/(2 — Ui) for i = 1, . . . , k — 1. 

We would like to express Yli=i ^ as a function of u . To simplify notation, 
set u = uq. 

Lemma 2. For i = 1, . . . , k, we have = u+l ~ 1 . 

Proof. The proof is by induction on i. For i = 1, we have «i = u/(u + 1). Now, 
for i > 2 suppose that = (u + i — 2)/(u + i— 1). We have = 1/(2 — = 
(u + i - l)/(2u + 2i - 2 - u - i + 2) = (u + i - l)/(u + i). 

It follows that for i = 1, . . . , k — 1 we have 

M + i — 1 M + Z M + i 



u+i u+i+1 u+i+1 
1 K i+1 



-A,. 



u+i+1 u+i+1 

Observe that as u_i varies from to 1, u varies from x' to x". Therefore, to 
maximize the total payoff, we have to choose u G [x', x"] so as to maximize 

H 1 r = R(u). 



u + 2 u + k 

By construction, the only maximum of on [x',x"] is a. It follows that in 
the payoff-maximizing Nash equilibrium of our game U$ plays a. 

Finally, note that the payoffs in our game are rational rather than integer. 
However, it is easy to see that we can multiply all payoffs to a player by their 
greatest common denominator without affecting his strategy. In the resulting 
game, all payoffs are integer. This concludes the proof of Theorem 2. 
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4 Approximating The Socially Optimal Nash Equilibrium 



We have seen that the Nash equilibrium that maximizes the social welfare may 
involve strategies that are not in Q. Hence, in this section we focus on finding a 
Nash equilibrium that is almost optimal from the social welfare perspective. We 
propose an algorithm that for any e > finds a Nash equilibrium whose total 
payoff is within e from optimal. The running time of this algorithm is polynomial 
in 1/e, n and |P ma x| (recall that P max is the maximum absolute value of an entry 
of a payoff matrix). 

While the negative result of the previous section is for graphical games on 
paths, our algorithm applies to a wider range of scenarios. Namely, it runs in 
polynomial time on bounded-degree trees as long as the best response policy 
of each vertex, given its parent, can be represented as a union of a polynomial 
number of rectangles. Note that path graphs always satisfy this condition: in [9] 
we showed how to compute such a representation, given a graph with maximum 
degree 2. Consequently, for path graphs the running time of our algorithm is 
guaranteed to be polynomial. (Note that [9] exhibits a family of graphical games 
on bounded-degree trees for which the best response policies of some of the 
vertices, given their parents, have exponential size, when represented as unions 
of rectangles.) 

Due to space restrictions, in this version of the paper we present the algorithm 
for the case where the graph underlying the graphical game is a path. We then 
state our result for the general case; the proof can be found in the appendix. 

Suppose that s is a strategy profile for a graphical game G. That is, s assigns 
a mixed strategy to each vertex of G. let EPv(s) be the expected payoff of 
player V under s and let EP(s) — J2v EPv(s). Let 

M(G) = ma,x{EP(s) | s is a Nash equilibrium for G}. 

Theorem 3. Suppose that G is a graphical game on an n-vertex path. Then for 
any e > there is an algorithm that constructs a Nash equilibrium s' for G that 
satisfies EP(s') > M(G) — e. The running time of the algorithm is 0(n 4 P^ a ^/e 3 ) 

Proof. Let {Vi, . . . , V n } be the set of all players. We start by constructing the 
best response policies for all Vi, i — 1, . . . ,n — 1. As shown in [9], this can be 
done in time 0(n 3 ). 

Let A > hn be a parameter to be selected later, set 5 = 1/N, and define 
X = {jfi | J = 0, ... , A}. We say that Vj is an event point for a player Vi if it 
is a Vi-event point for B(Vi, V$_i) or B(Vi + i, Vi). For each player Vi, consider a 
finite set of strategies Xi given by 

Xi = X U {vj\vj is an event point for Vi}. 

It has been shown in [9] that for any i — 2,...,n, the best response policy 
B(Vi,Vi-i) has at most 2n + 4 Vi-event points. As we require N > 5n, we 
have | Ai| < 2 A; assume without loss of generality that |Xj| = 2 A. Order the 
elements of A, in increasing order as x\ = < x\ < ■ ■ ■ < x\ N . We will refer 
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to the strategies in as discrete strategies of player Vi\ a strategy profile in 
which each player has a discrete strategy will be referred to as a discrete strategy 
profile. 

We will now show that even we restrict each player Vi to strategies from Xj, 
the players can still achieve a Nash equilibrium, and moreover, the best such 
Nash equilibrium (with respect to the social welfare) has total payoff at least 
M{G) — e as long as N is large enough. 

Let s be a strategy profile that maximizes social welfare. That is, let s = 
(si, . . . , s n ) where Sj is the mixed strategy of player Vi and EP(s) = M(G). For 
i = 1, . . . , n, let U = max{j;- | x\ < Sj}. First, we will show that the strategy 
profile t = (t\, . . . , t n ) is a Nash equilibrium for G. 

Fix any i, 1 < i < n, and let R — [v±, V2} x [u\, 112) be the rectangle in 
B(Vi,Vi-i) that contains (sj,Sj_i). As vi is a t^-event point of B(Vi,Vi-i), we 
have vi < ti, so the point (ti, s^_i) is inside R. Similarly, the point u\ is a V{-\- 
event point of B(Vi, Vi-i), so we have u\ < and therefore the point (t i} 
is inside R. This means that for any i, 1 < i < n, we have G pbr^. ^ti), 
which implies that t = (ti, . . . , t n ) is a Nash equilibrium for G. 

Now, let us estimate the expected loss in social welfare caused by playing t 
instead of s. 

Lemma 3. For any pair of strategy profiles t, s such that \ti — Si\ < 5 we have 
\EP Vt (s) - EP Vt (t) I < 24P max 5 for any i = 1, . . . , n. 

Proof. Let Pl lm be the payoff of the player V^, when he plays fc, plays 
and Vi + i plays to. Fix i = 1, . . . , n and for fc, Z, to <E {0, 1}, set 

= sti(l - ai-i) 1 -*^! - Si) 1 "'^ 1 - Sl+ i) 1 " m - 

We have 

^(^-^(t)^ 5; \n lm {t klm ~s kim )\< 

k,Lrn=0,l 

8P max m&x\t klm - s klm \ 

klm 

We will now show that for any k,l,m € {0, 1} we have \t klm — s klm \ < 3J; clearly, 
this implies the lemma. 

Indeed, fix k,l,m e {0, 1}. Set 

tti(l-*i-i) 1_ V= still-Si-i) 1 "", 
1/ = t{(i-t i ) 1 - , ,y'= 4(1 -si) 1 "', 

^(l-'Hl) 1 "",^ ^l(l -Si+l) 1 "™- 

Observe that if k = then ar — a;' = (1 — ti-i) — (1 — Sj-i), and if fc = 1 then 
x—x' = ti-i—Si-i, so \x— x'\ < 8. A similar argument shows \y— y'\ < S, \z—z'\ < 
6. Also, we have x,x',y,y', z, z' e [0, 1]. Hence, \t klm - s klm \ = \xyz - x'y'z'\ = 
\xyz— x'yz+x'yz— x'y'z+x'y'z— x'y'z'\ < \x— x'\yz+\y— y'\x'z+\z— z'\x'y' < 35. 
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Lemma 3 implies Y^i=i \EPvii s ) — EPvi(t)\ < 24nP max <5, so by choosing 
S < e/(24nP max ), or, equivalently, setting N > 24nP max /e, we can ensure that 
the total expected payoff for the strategy profile t is within e from optimal. 

We will now show that we can find the best discrete Nash equilibrium (with 
respect to the social welfare) using dynamic programming. As t is a discrete 
strategy profile, this means that the strategy profile found by our algorithm will 
be at least as good as t. 

Define m! i ' k to be the maximum total payoff that V\, . . . , Vi-\ can achieve if 
each Vj, j < i, chooses a strategy from Xj, for each j < i the strategy of Vj 
is a potential best response to the strategy of Vj+\, and, moreover, V—i plays 
x\_\, Vi plays x\. If there is no way to choose the strategies for V\,. .. ,Vi-\ 
to satisfy these conditions, we set m^ k = — oo. The values m!^ k , i = 1, . . . , n; 
k, I — 1,...,N, can be computed inductively, as follows. 

We have m l { k = for k, I = 1, . . . , N. Now, suppose that we have already 
computed m l - k for all j < i; k,l = 1, . . . , N. To compute m k ' 1 , we first check if 

(x k ,x\_ 1 ) <G B(Vi, Vi-i). If this is not the case, we have m l i ' k — — oo. Otherwise, 
consider the set Y = Xi-2 n pbr y ._ 2 (x\_ l ) 1 i.e., the set of all discrete strategies 
of Vi-2 that are potential best responses to x\_ l . The proof of Theorem 1 in [9] 
implies that the set pbr y ._ 2 {x\_{) is non-empty: the player Vi-2 has a potential 
best response to any strategy of Vj_i, in particular, x i _ 1 . By construction of 
the set Xi-2, this implies that Y is not empty. For each x\_ 2 e Y, let Pjik 
be the payoff that V^_i receives when Vi-2 plays x\_ 2l Vi-i plays x\_ x , and 
Vi plays x k . Clearly, pjik can be computed in constant time. Then we have 
m' ,fc = maxfm^j +Pjik I xj_ 2 G Y}. 

Finally, suppose that we have computed m 1 ^ for I, k = 1, . . . , N. We still 
need to take into account the payoff of player V n . Hence, we consider all pairs 
x n _i) that satisfy x l n _ 1 G pbr^ {x^}, and pick the one that maximizes the 
sum of m^' 1 and the payoff of V n when he plays x k and V n - i plays x l n _ 1 . This 
results in the maximum total payoff the players can achieve in a Nash equilibrium 
using discrete strategies; the actual strategy profile that produces this payoff can 
be reconstructed using standard dynamic programming techniques. 

It is easy to see that each m l - k can be computed in time O(N), i.e., all of them 
can be computed in time 0(nN 3 ). Recall that we have to select N > (24nP max ) / e 
to ensure that the strategy profile we output has total payoff that is within e 
from optimal. We conclude that we can compute an e-approximation to the best 
Nash equilibrium in time 0(n 4 P^ ax /e 3 ). This completes the proof of Theorem 3. 

To state our result for the general case (i.e., when the underlying graph is 
a bounded-degree tree rather than a path), we need additional notation. If G 
has n players, let q(n) be an upper bound on the number of event points in 
the representation of any best response policy. That is, we assume that for any 
vertex U with parent V, B(V, U) has at most q(n) event points. We will be 
interested in the situation in which q(n) is polynomial in n. 
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Theorem 4. Let G be an n-player graphical game on a tree in which each node 
has at most A children. Suppose we are given a set of best-response policies 
for G in which each best-response policy B(V, U) is represented by a set of rect- 
angles with at most q(n) event points. For any e > 0, there is an algorithm that 
constructs a Nash equilibrium s' for G that satisfies EP(s') > M(G) — e. The 
running time of the algorithm is polynomial in n, -P max and e _1 provided that 
the tree has bounded degree (that is, A — 0(1)) and q(n) is a polynomial in n. 
In particular, if 

N = max((4 + l)g(n) + 1, n2 A+2 (A + 2)P max £- 1 ) 

and A > 1 then the running time is 0(nA(2N) A . 

The proof of this theorem can be found in the appendix. 

4.1 A polynomial-time algorithm for multiplicative approximation 

The running time of our algorithm is pseudopolynomial rather than polynomial, 
because it includes a factor which is polynomial in P max , the maximum (in 
absolute value) entry in any payoff matrix. If we are interested in multiplicative 
approximation rather than additive one, this can be improved to polynomial. 

First, note that we cannot expect a multiplicative approximation for all in- 
puts. That is, we cannot hope to have an algorithm that computes a Nash 
equilibrium with total payoff at least (1 — e)M(G). If we had such an algorithm, 
then for graphical games G with M(G) = 0, the algorithm would be required to 
output the optimal solution. To show that this is infeasible, observe that we can 
use the techniques of Section 3.2 to construct two integer-coefficient graphical 
games on paths of length 0(n) such that for some X e M the maximal total 
payoff in the first game is X, the maximal total payoff in the second game is 
— X, and for both games, the strategy profiles that achieve the maximal total 
payoffs involve algebraic numbers of degree n. By combining the two games so 
that the first vertex of the second game becomes connected to the last vertex of 
the first game, but the payoffs of all players do not change, we obtain a graphical 
game in which the best Nash equilibrium has total payoff 0, yet the strategies 
that lead to this payoff have high algebraic complexity. 

However, we can achieve a multiplicative approximation when all entries 
of the payoff matrices are positive and the ratio between any two entries is 
polynomially bounded. Recall that we assume that all payoffs are integer, and 
let P m in > be the smallest entry of any payoff matrix. In this case, for any 
strategy profile the payoff to player i is at least P m i n , so the total payoff in the 
social-welfare maximizing Nash equilibrium s satisfies M(G) > nP mm . Moreover, 
Lemma 3 implies that by choosing 8 < e/(24P max /P m i n ), we can ensure that the 
Nash equilibrium t produced by our algorithm satisfies 

n n 

EP Vl (s) - EP Vi (t) < 24P max <5n < enP min < eM (G), 
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i.e., for this value of 6 we have ]T™ =1 EP Vi (t) > (1 - e)M(G). Recall that the 
running time of our algorithm is 0(nN 3 ), where N has to be selected to satisfy 
N > 5n, N = 1/6. It follows that if P m ; n > 0, P m ax/-fmin = poly(n), we 
can choose N so that our algorithm provides a multiplicative approximation 
guarantee and runs in time polynomial in n and 1/e. 

5 Bounded Payoff Nash Equilibria 

Another natural way to define what is a "good" Nash equilibrium is to require 
that each player's expected payoff exceeds a certain threshold. These thresholds 
do not have to be the same for all players. In this case, in addition to the payoff 
matrices of the n players, we are given n numbers T\, . . . , T n , and our goal is to 
find a Nash equilibrium in which the payoff of player i is at least Ti, or report 
that no such Nash equilibrium exists. It turns out that we can design an FPTAS 
for this problem using the same techniques as in the previous section. 

Theorem 5. Given a graphical game G on an n-vertex path and n rational 
numbers T\, . . . , T n , suppose that there exists a strategy profile s such that s is a 
Nash equilibrium for G and EPyi (s) > Ti for i = 1, . . . , n. Then for any e > 
we can find in time 0(max{nP^ ax /e 3 , n 4 /e 3 }) a strategy profile s' such that s' 
is a Nash equilibrium for G and EP Vi (s') > Tj — e for i = 1, . . . , n. 

Proof. The proof is similar to that of Theorem 3. First, we construct the best 
response policies for all players, choose N > 5n, and construct the sets X i: 
i = 1, . . . , n, as described in the proof of Theorem 3. 

Consider a strategy profile s such that s is a Nash equilibrium for G and 
EPy i (s) > Ti for i = 1, . . . , n. We construct a strategy profile ti = max{a:- | x\ < 
Si} and use the same argument as in the proof of Theorem 3 to show that t is a 
Nash equilibrium for G. By Lemma 3, we have lEPv^s) — EPv^t)] < 24P max (5, 
so choosing 5 < e/(24P max ), or, equivalently, N > max{5n, 24P max /e}, we can 
ensure EPy i (t) > Ti — e for i = 1, . . . , n. 

Now, we will use dynamic programming to find a discrete Nash equilibrium 
that satisfies EPy^t) > Ti — e for i = 1, . . . , n. As t is a discrete strategy 
profile, our algorithm will succeed whenever there is a strategy profile s with 
EP Vi (s) > Ti - e for i = 1, . . . , n. 

Let z\' k — 1 if there is a discrete strategy profile such that for any j < i the 
strategy of the player Vj is a potential best response to the strategy of Vj + i, the 
expected payoff of Vj is at least Tj — e, and, moreover, Vi~i plays x\_ 1 , Vi plays 
x\. Otherwise, let z\' k — 0. We can compute z\' k , i = 1, . . . , n; k,l = 1, . . . , N 
inductively, as follows. 

We have z l { k = 1 for k, I = 1, . . . , N. Now, suppose that we have already 
computed z l j k for all j < i; k,l = 1,...,N. To compute z i ' , we first check 

if (x k ,x\_ 1 ) e B(Vi,Vi-i). If this is not the case, clearly, z k ' 1 = 0. Otherwise, 
consider the set Y = Ai_ 2 n pbr y ._ 2 (x\_ 1 ) 1 i.e., the set of all discrete strategies 
of Vi-2 that are potential best responses to x\_ 1 . It has been shown in the proof 
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of Theorem 3 that Y ^ 0. For each x J i _ 2 e Y, let pjik be the payoff that Vi-i 
receives when V^_2 plays x\_ 2 , V%-\ plays x\_ x , and 14 plays x\. Clearly, pjik can 
be computed in constant time. If there exists an x\_ 2 e Y such that zf\ = 1 

and pjik > Ti_2 — set z' ,fc = 1. Otherwise, set z' ,fc = 0. 

Having computed z l ^ k , l,k — 1,...,N, we check if z\[ k = 1 for some pair 
(/, k). if such a pair of indices exists, we instruct V n to play x k and use dynamic 
programming techniques (or, equivalently, the downstream pass of the algorithm 
of [12]) to find a Nash equilibrium s' that satisfies EPy^s') > Ti — e for i = 
1, . . . , n (recall that V n is a dummy player, i.e., we assume T n = 0, EP n (s') = 
for any choice of s'). If z\i k = for alii, k = 1, . . . , N, there is no discrete Nash 
equilibrium s' that satisfies EPy i (s') > 7$ — e for i = 1, . . . , n and hence no Nash 
equilibrium s (not necessarily discrete) such that EPy i (s) > Tj for i = 1, . . . , n. 

The running time analysis is similar to that for Theorem 3; we conclude that 
the running time of our algorithm is 0(nN 3 ) = ©(maxjnP^^/e 3 , n 4 /e 3 }). 

Remark 1. Theorem 5 can be extended to trees of bounded degree in the same 
way as Theorem 4. 

5.1 Exact Computation 

Another approach to finding Nash equilibria with bounded payoffs is based on 
inductively computing the subsets of the best response policies of all players 
so as to exclude the points that do not provide sufficient payoffs to some of 
the players. Formally, we say that a strategy v of the player V is a potential 
best response to a strategy w of its parent W with respect to a threshold vector 
T = (Ti, . . . , T n ), (denoted by v € pbr y (w, T)) if there is an equilibrium in the 
instance G( V ,w),w=w m which V plays mixed strategy v and the payoff to any 
player Vi downstream of V (including V) is at least Tj. The best response policy 
for V with respect to a threshold vector T is defined as B(W, V, T) = {(w,v) | 
v E pbr y (w,T),w e [0,1]}. 

It is easy to see that if any of the sets B(Vj, Vj-i, T), j — 1, . . . , n, is empty, it 
means that it is impossible to provide all players with expected payoffs prescribed 
by T. Otherwise, one can apply the downstream pass of the original algorithm 
of [12] to find a Nash equilibrium. As we assume that V n is a dummy vertex 
whose payoff is identically 0, the Nash equilibrium with these payoffs exists as 
long as T n < and B(V n , V n -i, T) is not empty. 

Using the techniques developed in [9], it is not hard to show that for any 
j = 1, . . . ,n, the set B(Vj, Vj-i, T) consists of a finite number of rectangles, and 
moreover, one can compute B(Vj + \, Vj, T) given B(Vj,Vj-i,T). The advantage 
of this approach is that it allows us to represent all Nash equilibria that provide 
required payoffs to the players. However, it is not likely to be practical, since it 
turns out that the rectangles that appear in the representation of B(Vj, Vj-\,T) 
may have irrational coordinates. 

Claim. There exists a graphical game G on a 3-vertex path UVW and a vector 
T = (Ti,T 2 ,T 3 ) such that B(V,W, T) cannot be represented as a union of a 
finite number of rectangles with rational coordinates. 
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Proof. We define the payoffs to the players in G as follows. The payoff to U is 
identically 0, i.e., P°(U) = P 1 ^) = 0. Using Claim 2, we select the payoffs to 
V so that P°(V) = uw, P X {V) = P°(V) + w - .8m - .1, where u and w are 
the (mixed) strategies of U and W, respectively. It follows that V is indifferent 
between playing and 1 if and only if w = f(u) = .8u + .1; observe that for any 
u G [0, 1] we have f(u) € [0, 1]. It is not hard to see that we have 

B(W,V) = [0, .l]x{0} U [.1, .9]x[0, 1] U [.9, l]x{l}. 

The payoffs to W are not important for our construction; for example, set 
P (W) - P {W) = 0. 

Now, set T = (0, 1/8, 0), i.e., we are interested in Nash equilibria in which V's 
expected payoff is at least 1/8. Suppose to e [0, 1]. The player V can play a mixed 
strategy v when W is playing w as long as U plays u = f~ 1 (w) = 5m>/4 — 1/8 
(to ensure that V is indifferent between and 1) and P°(V) = -P 1 (V r ) = uw = 
w(5w/4 - 1/8) > 1/8. The latter condition is satisfied if w < (1 - V3T)/20 < 
or w > (1 + V^T)/20. Note that we have .1 < (1 + \/41)/20 < .9. For any other 
value of w, any strategy of U either makes V prefer one of the pure strategies 
or does not provide it with a sufficient expected payoff. There are also some 
values of to for which V can play a pure strategy (0 or 1) as a potential best 
response to W and guarantee itself an expected payoff of at least 1/8; it can 
be shown that these values of w form a finite number of segments in [0, 1]. We 
conclude that any representation of B(W, V, T) as a union of a finite number of 
rectangles must contain a rectangle of the form [(1 + V41)/20, w"] x [v 1 , v"] for 
some w",v',v" € [0,1]. 

On the other hand, it can be shown that for any integer payoff matrices and 
threshold vectors and any j = 1, . . . , n — 1, the sets B(Vj + i, Vj,T) contain no 
rectangles of the form [u', u"]x{v} or {v}x \w' , w"\, where v £ R\Q. This means 
that if B(V n , V n -i, T) is non-empty, i.e., there is a Nash equilibrium with payoffs 
prescribed by T, then the downstream pass of the algorithm of [12] can always 
pick a strategy profile that forms a Nash equilibrium, provides a payoff of at 
least Tj to the player Vi, and has no irrational coordinates. Hence, unlike in the 
case of the Nash equilibrium that maximizes the social welfare, working with 
irrational numbers is not necessary, and the fact that the algorithm discussed in 
this section has to do so can be seen as an argument against using this approach. 

6 Other Criteria for Selecting a Nash Equilibrium 

In this section, we consider several other criteria that can be useful in selecting 
a Nash equilibrium. 

6.1 Combining welfare maximization with 
bounds on payoffs 

In many real life scenarios, we want to maximize the social welfare subject to 
certain restrictions on the payoffs to individual players. For example, we may 
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want to ensure that no player gets a negative expected payoff, or that the ex- 
pected payoff to player i is at least P^ ax — £, where -P^ax ^ the maximum entry 
of i's payoff matrix and £ is a fixed parameter. Formally, given a graphical game 
G and a vector T\, . . . , T n , let S be the set of all Nash equilibria s of G that 
satisfy Xi < EPv^s) for i = 1, . . . , n, and let s = argmax sg5 EP(s). 

If the set 5 is non-empty, we can find a Nash equilibrium s' that is e-close 
to satisfying the payoff bounds and is within e from s with respect to the total 
payoff by combining the algorithms of Section 4 and Section 5. 

Namely, for a given e > 0, choose 5 as in the proof of Theorem 3, and let 
Xi be the set of all discrete strategies of player Vi (for a formal definition, see 
the proof of Theorem 3). Combining the proofs of Theorem 3 and Theorem 5, 
we can see that the strategy profile t given by ti = maxfi- | x\ < Si} satisfies 
EP Vi (t) > T t - e, \EP(s) - EP(i)\ < e. 

Define m i ' k to be the maximum total payoff that V\, . . . , Vi_i can achieve if 
each Vj, j < i, chooses a strategy from Xj, for each j < i the strategy of Vj 
is a potential best response to the strategy of Vj + i and the payoff to player Vj 
is at least Tj — e, and, moreover, Vi-i plays x\_ 1 , Vi plays x\. If there is no 
way to choose the strategies for Vi, . . . , Vi_i to satisfy these conditions, we set 
m l i ' k = — oo. The m'' fc can be computed by dynamic programming similarly to 
the m- and 2 f ' in the proofs of Theorems 3 and 5. Finally, as in the proof of 
Theorem 3, we use m^ fe to select the best discrete Nash equilibrium subject to 
the payoff constraints. 

Even more generally, we may want to maximize the total payoff to a subset 
of players (who are assumed to be able to redistribute the profits fairly among 
themselves) while guaranteeing certain expected payoffs to (a subset of) the 
other players. This problem can be handled similarly. 

6.2 A minimax approach 

A more egalitarian measure of the quality of a Nash equilibrium is the minimal 
expected payoff to a player. The optimal solution with respect to this measure is a 
Nash equilibrium in which the minimal expected payoff to a player is maximal. To 
find an approximation to such a Nash equilibrium, we can combine the algorithm 
of Section 5 with binary search on the space of potential lower bounds. Note 
that the expected payoff to any player Vi given a strategy s always satisfies 

— -Pmax < EPy^s) < -P m ax- 

For a fixed e > 0, we start by setting T = -P max , T" = P max , T* = 
(T' + T")/2. Wc then run the algorithm of Section 5 with T x = ■ ■ ■ = T n = T* . If 
the algorithm succeeds in finding a Nash equilibrium s' that satisfies EPy i (s') > 
T* - e for alH = 1, . . . , n, we set V = T*, T* = (T" + T") /2; otherwise, we set 
T" = T*, T* = (T' + T")/2 and loop. We repeat this process until \T'-T"\ < e. 
It is not hard to check that for any p £ M., if there is a Nash equilibrium s such 
that 

iiiiii^— i,...,n BPvi (p) ^ Pi then our algorithm outputs a Nash equilibrium s' 

that satisfies min^— i ji EPy i (s) ^ p — 26. Xhe running time of our algorithm is 

0(max{nP3 ax l g e J i/e 3 ,n 4 loge^Ve 3 }). 
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6.3 Equalizing the payoffs 



When the players' payoff matrices are not very different, it is reasonable to 
demand that the expected payoffs to the players do not differ by much either. 
We will now show that Nash equilibria in this category can be approximated in 
polynomial time as well. 

Indeed, observe that the algorithm of Section 5 can be easily modified to 
deal with upper bounds on individual payoffs rather than lower bounds. More- 
over, we can efficiently compute an approximation to a Nash equilibrium that 
satisfies both the upper bound and the lower bound for each player. More pre- 
cisely, suppose that we are given a graphical game G, 2n rational numbers 
T\, . . . , T n , T{, . . . , T' n and e > 0. Then if there exists a strategy profile s such 
that s is a Nash equilibrium for G and Ti < EPvi (s) < T[ for i = 1, . . . , n, we can 
find a strategy profile s' such that s' is a Nash equilibrium for G and Ti — e < 
EP Vi (s') < T! + e for i = 1, . . . ,n. The modified algorithm also runs in time 
0(max{nP3 ax / e 3,[4K/e 3 })- 

This observation allows us to approximate Nash equilibria in which all play- 
ers' expected payoffs differ by at most £ for any fixed £ > 0. Given an e > 0, 
we set Ti = • • • = T n = -P max , T{ = ■ ■ ■ = T' n = -P max + £ + e, and run 
the modified version of the algorithm of Section 5. If it fails to find a solution, 
we increment all Tj,T/ by e and loop. We continue until the algorithm finds a 
solution, or T > P max . 

Suppose that there exists a Nash equilibrium s that satisfies 
\EP Vi [s) - EP Vj {s)\ < £ for all i,j = 1, . . . , n. Set r = min^i,...^ EP Vi (s); 
we have r < EPv^s) < r + £ for all i = 1, . . . ,n. There exists a k > such 
that — P max + (k — l)e < r < — P max + ke. During the fcth step of the algo- 
rithm, we set Ti = ■ ■ ■ = T n = — P max + (k — l)e, i.e., we have r — e < Ti < r, 
r + £ < T[ < r + £ + e. That is, the Nash equilibrium s satisfies Ti < r < 
EPy i (s) < r + £ < T!, which means that when T, is set to — P max + (k — l)e, our 
algorithm is guaranteed to output a Nash equilibrium t that satisfies r — 2e < 
T - e < EP V% (t) < T[ + e < r + £ + 2e. We conclude that whenever such a Nash 
equilibrium s exists, our algorithm outputs a Nash equilibrium t that satisfies 
\EP Vi (t) - EP Vj (t)| < C + 4e for all i, j = 1, . . . , n. The running time of this 
algorithm is 0(max{nP^ ax /e 4 , n 4 /e 4 }). 

Note also that we can find the smallest £ for which such a Nash equilib- 
rium exists by combining this algorithm with binary search over the space 
£ = [0,2P max ]. This identifies an approximation to the "fairest" Nash equi- 
librium, i.e., one in which the players' expected payoffs differ by the smallest 
possible amount. 

Finally, note that all results in this section can be extended to bounded- 
degree trees. 

7 Conclusions 

We have studied the problem of equilibrium selection in graphical games on 
bounded-degree trees. We considered several criteria for selecting a Nash equi- 
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librium, such as maximizing the social welfare, ensuring a lower bound on the 
expected payoff of each player, etc. First, we focused on the algebraic complex- 
ity of a social welfare-maximizing Nash equilibrium, and proved strong negative 
results for that problem. Namely, we showed that even for graphical games on 
paths, any algebraic number a G [0, 1] may be the only strategy available to some 
player in all social welfare-maximizing Nash equilibria. This is in sharp contrast 
with the fact that graphical games on trees always possess a Nash equilibrium 
in which all players' strategies are rational numbers. 

We then provided approximation algorithms for selecting Nash equilibria 
with special properties. While the problem of finding approximate Nash equilib- 
ria for various classes of games has received a lot of attention in recent years, 
most of the existing work aims to find e-Nash equilibria that satisfy (or are e- 
close to satisfying) certain properties. Our approach is different in that we insist 
on outputting an exact Nash equilibrium, which is e-closc to satisfying a given 
requirement. As argued in the introduction, there are several reasons to prefer 
a solution that constitutes an exact Nash equilibrium. 

Our algorithms are fully polynomial time approximation schemes, i.e., their 
running time is polynomial in the inverse of the approximation parameter e, 
though they may be pseudopolynomial with respect to the input size. Under 
mild restrictions on the inputs, they can be modified to be truly polynomial. 
This is the strongest positive result one can derive for a problem whose exact 
solutions may be hard to represent, as is the case for many of the problems 
considered here. While we prove most of our results for games on a path, they 
can be generalized to any tree for which the best response policies have compact 
representations as unions of rectangles. In the appendix, we show how to do 
this for the algorithm that finds a payoff-maximizing Nash equilibrium; other 
algorithms can be treated similarly. 

Further work in this vein could include extensions to the kinds of guarantees 
sought for Nash equilibria, such as guaranteeing total payoffs for subsets of play- 
ers, selecting equilibria in which some players are receiving significantly higher 
payoffs than their peers, etc. At the moment however, it is perhaps more impor- 
tant to investigate whether Nash equilibria of graphical games can be computed 
in a decentralized manner, in contrast to the algorithms we have introduced 
here. 

It is natural to ask if our results or those of [9] can be generalized to games 
with three or more actions. However, it seems that this will make the analysis 
significantly more difficult. In particular, note that one can view the bounded 
payoff games as a very limited special case of games with three actions per player. 
Namely, given a two-action game with payoff bounds, consider a game in which 
each player Vi has a third action that guarantees him a payoff of Ti no matter 
what everyone else does. Then checking if there is a Nash equilibrium in which 
none of the players assigns a non-zero probability to his third action is equivalent 
to checking if there exists a Nash equilibrium that satisfies the payoff bounds in 
the original game, and Section 5.1 shows that finding an exact solution to this 
problem requires new ideas. 
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Alternatively it may be interesting to look for similar results in the context 
of correlated equilibria (CE), especially since the best CE may have higher value 
(total expected payoff) than the best NE. The ratio between these values is 
called the mediation value in [1]. It is known from [1] that the mediation value 
of 2-player, 2-action games with non- negative payoffs is at most |, and they 
exhibit a 3-player game for which it is infinite. Furthermore, a 2-player, 3-action 
example from [1] also has infinite mediation value. 
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8 Appendix 



Theorem 4 Let G be an n-player graphical game on a tree in which each node 
has at most A children. Suppose we are given a set of best-response policies 
for G in which each best-response policy B(V, U) is represented by a set of 
rectangles with at most q(n) event points. For any e > 0, there is an algorithm 
that constructs a Nash equilibrium s' for G that satisfies EP(s') > M(G) — e. 
The running time of the algorithm is polynomial in n, P max and e^ 1 provided 
that the tree has bounded degree (that is, A = 0(1)) and q(n) is a polynomial 
in n. In particular, if 

N = max((Z\ + l)q{n) + 1, n2 A+2 (A + 2)P max £- 1 ) 
and A > 1 then the running time is 0(nA(2N) A . 

Proof. Let 5 = 1/N and let X = {j5 | j = 0, . . . , N}. Consider the set of best- 
response policies for G. We say that a point v £ [0, 1] is an event point for a 
player V with parent W if either v is a T^-event point of B(W, V) or, for some 
child U of V, v is a V-event point of B(V, U). Let X' v be the set of event points 
for V. Since N > (A + l)q(n) + 1, \X^\ < N - 1. Let l v be a size-2A^ superset 

0fXUX[r. 

We will refer to the strategies in Xy as discrete strategies of player V. A 
strategy profile in which each player has a discrete strategy will be referred to 
as a discrete strategy profile. We will now show that there is a discrete strategy 
profile t which is a Nash equilibrium and in which the total payoff is at least 
M(G) - e. 

Let s be a Nash equilibrium that maximizes social welfare (so EP(s) = 
M{G)). For every player V, let t v = max{xy £ X v \ xy < sy}- 

First, we will show that the strategy profile t is a Nash equilibrium for G. 
Consider a player V with parent W. Let R = [wi, W2} x [vi, V2} be the rectangle 
in B(W, V) that contains (sw,sy). As w\ is a W-event point of B(W, V), we 
have wi < tw and tw < s w < W2, so the point (tw,sy) is inside R. Similarly, 
the point v\ is a F-event point of B(W, V), so we have v\ < ty, and therefore 
the point (tw, ty) is inside R. This means that for any player V with parent W, 
we have ty £ pbr v (tw), which implies that t is a Nash equilibrium for G. 

Now, let us estimate the expected loss in social welfare caused by playing t 
instead of s. 

Lemma 4. For any pair of strategy profiles t, s such that, for all players V, 
\ty — sy \ < S, we also have 

\EPy(s) - EP v (t)\ < 2 A+2 (A + 2)P max (5 

for all V . 

Proof. Fix any player U±. Let Uq be his parent, and let U2, ■ ■ ■ , Ua be his children. 
For a £ {0, let P a be the payoff to Ui when, for £ £ {0,...,d+ 1}, U e 
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plays a i. Let t a be the probability that this event occurs according to strategy 
profile t. Formally, let 

t i n f z, if b = 1, 



Then 



Similarly, let s a be the probability that this event occurs according to strategy 
profile s. Then 



\EP v (s) - EP v (t)\ 



a a 

(7 

— 2 P ma x max |tcr 5^ | . 



< 



We will now show that, for any cr, |t CT — s^- 1 < (A + 2)6, which implies the lemma. 
For r e {0, . . . ,d+ 2}, let 



d+i 



Y[V(tu t ,<ri)n*(8 Ut ,at)- 



e=r 



Since \t Ue - s Ue \ < 6, we have, for any j e {0, 1}, \&(t Ut ,j) ~ &{su t ,j)\ < 
So \ze.a- - ze-i,a\ < S. Then 



\t a — S<j\ = \zd+2M — Zq^\ = 



d+2 



d+2 
1=1 



< J2\*e,a - z e -i,a\ < (A + 2)6. 



e=i 



So by Lemma 4, the expected loss in social welfare caused by playing t 
instead of s is at most n2 A+2 {A + 2)P m ^6. Since N = 1/6 is at least n2 A+2 (A + 
2)-Pmaxe _1 , this is at most e. 

We will now show that how to find the best discrete Nash equilibrium (with 
respect to the social welfare) using dynamic programming. As t is a discrete 
strategy profile, the strategy profile found by our algorithm will be at least as 
good as t. 

For a player V with d children and a string r e {f , . . . , 2N} d+1 , define 
my. r to be the maximum total payoff that proper descendants of V can achieve 
in a strategy profile satisfying the following conditions, where xy denotes the 
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To'th strategy from the discrete strategy set Xy of player V, and for every 
£ G {1, . . . , d}, Ue is the £'th child of V and xu e is the r^'th strategy from the 
discrete strategy set Xjj t . 

— V plays xy. 

— For every I € {1, . . . , d}, Ue plays Xu e ■ 

— All descendants of V play strategies from their discrete strategy sets. 

— For every proper descendant U of V, the strategy chosen by U is a potential 
best response to the strategy chosen by its parent. 

If no strategy profile satisfies these conditions, my. T = — oo. 

For any leaf V of the tree and any string r, my. T = 0. Here is how the 
algorithm computes my. T for an internal vertex V, assuming it has already com- 
puted mu, T ' for all proper descendants UofV and all strings r'. First, it checks 
every child Ue of V, to see whether xjj t € \)hv Ut {xy). If this is not the case, 
the algorithm sets my T = — oo and finishes. Now let rg denote the number of 
children of Ue (which may be zero) and let me equal 

maxjm^y + P l<T , \ t' G {1, . . . , 2N} r ' i+1 , r' = r £ } , 

where Pe, T < denotes the payoff to Ue when V plays xy and Ue and its children 
play according to r' (the ith child of Ue plays the r/th element of its discrete 
strategy set and Ue plays xjj e ). Then 

d 

my T = ^ ™e> 

and this can be computed in polynomial time by considering each possible r' (at 
most (2N) A of them) for each child Ue (of which there are at most A). 

Finally, suppose that we have computed all of the values my tT . We assumed 
without loss of generality (see Section 2) that the root of the tree is a node V 
which has constant payoff 0, independently of the action chosen by its singleton 
child U\. To find the discrete strategy that maximizes social welfare, we just 
choose the r <G {1, . . . , 2N} 2 that maximises my iT . This value, my jT is the social 
welfare achieved by the algorithm. The discrete strategy that achieves this social 
welfare can now be constructed by standard dynamic programming techniques. 
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